Skip to content

perf: faster model resolution, JSON decoding, and request snapshotting#413

Merged
SantiagoDePolonia merged 3 commits into
mainfrom
perf/optimization
Jun 18, 2026
Merged

perf: faster model resolution, JSON decoding, and request snapshotting#413
SantiagoDePolonia merged 3 commits into
mainfrom
perf/optimization

Conversation

@SantiagoDePolonia

@SantiagoDePolonia SantiagoDePolonia commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Reduces per-request CPU and allocations on the gateway hot path. All changes are behavior-preserving for valid input; the one intentional difference (goccy input leniency) is documented and pinned by a test. Verified with the full make test-race suite, make lint, and the hot-path perf guard (all green via pre-commit hooks).

Changes

1. O(1) model resolution

Per request, the router resolved the model selector ~6× and each qualified resolution copied the entire model catalog and linear-scanned it (ListModelsWithProvider).

  • Added a lazy provider-selector index to the registry (qualifiedByName / qualifiedByType), built once and cleared at the existing single invalidation point.
  • Routed resolution through it via an optional qualifiedSelectorResolver interface; the catalog scan remains as a fallback for non-indexed lookups and raw slash-shaped model IDs.
  • Deduplicated the now-redundant name/type scans in resolveQualifiedSelector.

Measured: resolution is now O(1) and constant in catalog size.

Catalog Before After
300 models 31,454 ns / 164 KB / req 800 ns / 0.3 KB / req
1000 models 95,930 ns / 459 KB / req 814 ns / 0.3 KB / req

2. JSON decoding → goccy/go-json

  • Migrated internal/ + cmd/ from encoding/json to github.com/goccy/go-json (true drop-in; package is named json). gjson unchanged. Test files intentionally stay on encoding/json as a stdlib oracle.
  • ~3.8× faster realistic chat-body decode (39,000 → 10,300 ns) with fewer allocations.
  • Dropped the redundant gjson.ValidBytes walk in extractUnknownJSONFields (callers already validate via the preceding Unmarshal).

Behavior note: goccy is slightly more lenient than stdlib on a couple of malformed inputs (leading-zero numbers; malformed values inside skipped passthrough fields). Accepted under the gateway's "accept generously" (Postel's Law) principle and pinned by TestDecoderLeniencyIsBounded. All valid input decodes identically.

3. Request-snapshot allocations

  • Added NewRequestSnapshotWithOwnedMaps so ingress capture owns the freshly-built route/query/trace maps and body, cloning only the live header map.
  • Added zero-copy HeadersView and pointed read-only callers at it.
  • Removed the now-superseded NewRequestSnapshotWithOwnedBody constructor.

4. Perf harness

  • The gateway hot-path benchmark previously passed a bare provider to server.New, bypassing the Router/registry entirely — so the perf guard protected a path that doesn't exist in production. It now wires the real Router + a populated catalog, with a guard case. Added a resolution micro-benchmark (resolve_bench_test.go).

Impact framing

JSON decode and model resolution are a slice of per-request work (the upstream LLM call dominates wall-clock), so this is primarily a throughput / CPU / GC win across every endpoint, not a dramatic per-request latency drop.

Risks / follow-ups

  • New dependency: github.com/goccy/go-json (MIT, pure Go, v0.10.6) is now on the core hot path — worth a conscious sign-off.
  • Benchmarks ran on darwin/arm64. goccy is pure Go (consistent across platforms), but recommend confirming the win on linux/amd64 (prod arch) in CI before merge.

Test plan

  • make test-race (full suite, 58 packages) — green
  • make lint — green
  • hot-path perf guard — green
  • linux/amd64 benchmark confirmation (CI)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Performance Improvements
    • Improved JSON processing throughput across the service by switching to a faster JSON implementation.
    • Reduced per-request model/provider selection overhead via cached, routed selector resolution.
    • Expanded hot-path benchmarking to measure both bare and routed request flows.
  • Bug Fixes
    • Tightened JSON parsing/leniency around previously tolerated malformed patterns while still rejecting invalid syntax.
  • Tests
    • Added/updated benchmarks for selector resolution and routed hot-path handling, including updated snapshot/header-related test coverage.

Reduce per-request CPU and allocations on the gateway hot path. Changes are
behavior-preserving for all valid input; the one intentional difference is
documented and tested.

Model resolution (O(1)):
- Add a lazy provider-selector index to the registry (qualifiedByName /
  qualifiedByType), invalidated at the existing single cache-invalidation point.
- Route qualified-selector resolution through it via an optional
  qualifiedSelectorResolver interface, with the catalog scan kept as a fallback
  for non-indexed lookups and raw slash-shaped model IDs.
- Resolution is now O(1) and constant in catalog size (was O(N), copying the
  full catalog several times per request): ~31us/164KB -> ~0.8us/0.3KB at 300
  models. Deduplicated the redundant name/type scans in resolveQualifiedSelector.

JSON decoding (goccy/go-json):
- Migrate internal/ + cmd/ from encoding/json to github.com/goccy/go-json
  (drop-in; package is named json). gjson is unchanged.
- ~3.8x faster realistic chat-body decode with fewer allocations.
- goccy is slightly more lenient than encoding/json on a couple of malformed
  inputs (leading-zero numbers; malformed values in skipped passthrough fields).
  Accepted under the gateway's accept-generously principle and pinned by
  TestDecoderLeniencyIsBounded.
- Drop the redundant gjson.ValidBytes walk in extractUnknownJSONFields (callers
  already validate via the preceding Unmarshal).

Request snapshot allocations:
- Add NewRequestSnapshotWithOwnedMaps so ingress capture owns the freshly built
  route/query/trace maps and body, cloning only the live header map.
- Add HeadersView (zero-copy) and route read-only callers to it.
- Remove the now-superseded NewRequestSnapshotWithOwnedBody constructor.

Perf harness:
- Make the gateway hot-path benchmark exercise the real Router + populated
  catalog (it previously bypassed routing, giving false confidence) and add a
  guard case for it. Add a resolution micro-benchmark.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Too many files changed for review. (126 files found, 100 file limit)

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6f4b8cd6-af65-4a5d-ab0a-55c3e3366e5a

📥 Commits

Reviewing files that changed from the base of the PR and between 4f26d43 and 39d477c.

📒 Files selected for processing (3)
  • internal/core/request_snapshot_test.go
  • internal/providers/registry.go
  • internal/providers/resolve_bench_test.go

📝 Walkthrough

Walkthrough

The PR replaces encoding/json with github.com/goccy/go-json across the entire codebase (~130+ files), renames NewRequestSnapshotWithOwnedBody to NewRequestSnapshotWithOwnedMaps and adds a zero-copy HeadersView() accessor, removes an internal gjson.ValidBytes pre-check from extractUnknownJSONFields, adds an O(1) qualified-selector index (qualifiedByName/qualifiedByType) to ModelRegistry with a qualifiedSelectorResolver fast path in Router, and extends hot-path benchmarks to cover the routed resolution path.

Changes

go-json migration, RequestSnapshot refactor, O(1) model resolution, and benchmarks

Layer / File(s) Summary
Dependency addition and RequestSnapshot API refactor
go.mod, internal/core/request_snapshot.go, internal/core/request_snapshot_test.go, internal/server/request_snapshot.go, internal/auditlog/entry_capture.go, internal/responsecache/responsecache.go, internal/providers/xai/xai.go
Adds github.com/goccy/go-json v0.10.6; replaces NewRequestSnapshotWithOwnedBody with NewRequestSnapshotWithOwnedMaps (owns all maps, only clones headers); adds HeadersView() zero-copy accessor; migrates internalJSONAuditHeaders, internalRequestHeaders, and xGrokConversationIDFromSnapshot from GetHeaders() to HeadersView().
json_fields behavioral change and leniency tests
internal/core/json_fields.go, internal/core/json_fields_test.go
Switches to go-json, removes internal gjson.ValidBytes pre-validation from extractUnknownJSONFields (relies on caller-guaranteed valid JSON), replaces the invalid-syntax test to target ChatRequest.UnmarshalJSON, and adds TestDecoderLeniencyIsBounded pinning two accepted go-json leniencies in passthrough fields.
ModelRegistry O(1) selector index
internal/providers/registry.go, internal/providers/registry_metadata.go
Adds qualifiedByName and qualifiedByType cached maps; implements ResolveProviderSelector with lazy index building under write lock and O(1) read-lock lookup; introduces buildSelectorIndexLocked with deterministic collision handling and lookupSelectorIndex helper; clears index maps on cache invalidation; swaps JSON imports to go-json.
Router qualifiedSelectorResolver fast path
internal/providers/router.go
Adds unexported qualifiedSelectorResolver interface; updates resolveQualifiedSelector to attempt fast-path lookup first and fall back to resolveProviderOwnedRawSelector scan when unavailable; swaps JSON imports to go-json.
Resolution and hot-path performance benchmarks
internal/providers/resolve_bench_test.go, tests/perf/hotpath_test.go, tests/perf/README.md
Adds BenchmarkResolvePerRequest and BenchmarkListModelsWithProvider with buildBenchRegistry helper; extends hotpath_test.go with benchProvider.models field, newRoutedBenchServer factory, BenchmarkGatewayHotPathChatCompletionRouted, and a new routed-path ceiling in TestHotPathPerfGuard; documents bare vs. routed benchmark differences in README.
Global encoding/json → goccy/go-json import swap
cmd/..., internal/admin/..., internal/aliases/..., internal/anthropicapi/..., internal/app/..., internal/auditlog/..., internal/batch/..., internal/cache/..., internal/conversationstore/..., internal/core/..., internal/embedding/..., internal/gateway/..., internal/guardrails/..., internal/live/..., internal/llmclient/..., internal/modeldata/..., internal/modeloverrides/..., internal/pricingoverrides/..., internal/providers/..., internal/responsecache/..., internal/responsestore/..., internal/server/..., internal/streaming/..., internal/usage/..., internal/workflows/...
Replaces encoding/json with github.com/goccy/go-json (imported as json) across all remaining files; all existing json.Marshal, json.Unmarshal, json.RawMessage, json.NewDecoder, json.Valid, and json.Number usages now resolve to the new library.

Sequence Diagram(s)

Model resolution now supports an optional O(1) fast path when ModelRegistry provides a cached selector index, falling back to catalog scan when the resolver is unavailable:

sequenceDiagram
  participant req as HTTP Request
  participant router as Router
  participant resolver as ModelRegistry<br/>qualifiedSelectorResolver
  participant fallback as resolveProviderOwnedRawSelector
  participant found as core.ModelSelector

  req->>router: resolveQualifiedSelector(segment, modelID)
  router->>resolver: ResolveProviderSelector(segment, modelID)
  alt fast path available
    resolver->>resolver: RLock → lookupSelectorIndex
    alt index cache hit
      resolver-->>router: ModelSelector, ok=true
    else index miss
      resolver->>resolver: WLock → buildSelectorIndexLocked
      resolver->>resolver: populate qualifiedByName, qualifiedByType
      resolver-->>router: ModelSelector, ok=true/false
    end
  else no resolver (fallback)
    router->>fallback: scan catalog for match
    fallback-->>router: ModelSelector, ok=true/false
  end
  router-->>found: matched provider selector
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • ENTERPILOT/GoModel#158: The main PR's go-json import swap in internal/streaming/observed_sse_stream.go directly affects the JSON unmarshalling path used by the shared ObservedSSEStream/observer SSE parsing introduced in that PR.
  • ENTERPILOT/GoModel#293: Main PR's JSON implementation swap in internal/usage/cost.go (including json.Number/numeric parsing paths) directly impacts the OpenRouter CalculateUsageCost logic introduced in that PR.
  • ENTERPILOT/GoModel#389: Main PR's JSON implementation swap in internal/core/chat_content.go directly overlaps with #389's new/updated input_audio validation and marshaling logic in the same file.

Poem

🐇 Hop hop, import swap complete,
No more stdlib JSON, what a feat!
go-json now zips through every byte,
O(1) routing gleaming bright.
My carrot cache resolves in a blink —
Faster than you'd dare to think! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately and concisely summarizes the main optimizations: faster model resolution (O(1)), JSON decoding (goccy library migration), and request snapshotting (new owned-maps constructor).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/optimization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 86.84211% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/providers/registry.go 87.75% 3 Missing and 3 partials ⚠️
internal/core/request_snapshot.go 88.23% 1 Missing and 1 partial ⚠️
internal/providers/router.go 66.66% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/core/request_snapshot_test.go (1)

74-106: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Expand owned-maps test coverage to include map/header semantics.

The renamed test currently pins only captured-body ownership. Please also assert that owned route/query/trace maps are not cloned, while headers are still cloned, to lock in the full constructor contract.

💡 Suggested test extension
 func TestNewRequestSnapshotWithOwnedMaps_TakesOwnershipOfCapturedBytes(t *testing.T) {
+	routeParams := map[string]string{"provider": "openai"}
+	queryParams := map[string][]string{"limit": {"5"}}
+	headers := map[string][]string{"X-Test": {"a"}}
+	traceMetadata := map[string]string{"Traceparent": "trace-1"}
 	rawBody := []byte(`{"model":"gpt-5-mini"}`)
 
 	snapshot := NewRequestSnapshotWithOwnedMaps(
 		"POST",
 		"/v1/chat/completions",
-		nil,
-		nil,
-		nil,
+		routeParams,
+		queryParams,
+		headers,
 		"application/json",
 		rawBody,
 		false,
 		"req-123",
-		nil,
+		traceMetadata,
 		"/team/a",
 	)
@@
 	if &clonedBody[0] == &rawBody[0] {
 		t.Fatal("CapturedBody returned owned bytes directly, want defensive copy")
 	}
+
+	routeParams["provider"] = "anthropic"
+	if got := snapshot.GetRouteParams()["provider"]; got != "anthropic" {
+		t.Fatalf("GetRouteParams provider = %q, want anthro pic (owned map)", got)
+	}
+	queryParams["limit"][0] = "99"
+	if got := snapshot.GetQueryParams()["limit"][0]; got != "99" {
+		t.Fatalf("GetQueryParams limit = %q, want 99 (owned map)", got)
+	}
+	traceMetadata["Traceparent"] = "trace-2"
+	if got := snapshot.GetTraceMetadata()["Traceparent"]; got != "trace-2" {
+		t.Fatalf("GetTraceMetadata Traceparent = %q, want trace-2 (owned map)", got)
+	}
+	headers["X-Test"][0] = "mutated"
+	if got := snapshot.GetHeaders()["X-Test"][0]; got != "a" {
+		t.Fatalf("GetHeaders X-Test = %q, want a (cloned headers)", got)
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/core/request_snapshot_test.go` around lines 74 - 106, The
TestNewRequestSnapshotWithOwnedMaps_TakesOwnershipOfCapturedBytes test currently
only validates captured body ownership but does not test the complete contract
of the NewRequestSnapshotWithOwnedMaps constructor regarding map and header
semantics. Add assertions to verify that route, query, and trace maps passed to
NewRequestSnapshotWithOwnedMaps are not cloned by the snapshot (confirming
ownership is taken), while headers should still be defensively cloned to prevent
external mutations. Create sample maps for these fields, pass them through the
constructor, retrieve them via appropriate accessor methods, and verify the
pointer equality or inequality as needed to confirm the ownership vs cloning
behavior for each map type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/providers/registry.go`:
- Around line 174-183: The code builds selector-index keys using potentially
untrimmed values for publicName and info.ProviderType, while the lookup paths
(lines 126-127) use trimmed inputs. Apply strings.TrimSpace() to normalize
publicName when it is assigned from info.ProviderName, and also apply
strings.TrimSpace() to info.ProviderType before using it in the typeKey
construction. This ensures that keys built during registration match keys used
during lookups even when config/provider metadata contains whitespace padding.

In `@internal/providers/resolve_bench_test.go`:
- Around line 33-35: The benchmark test case labels are mislabeled due to
integer truncation in the buildBenchRegistry call. In the benchmark loop that
iterates over slice values (50, 300, 1000), the calculation n/6 for the
per-provider count causes truncation, resulting in actual model counts that
differ from the label (models=50 actually benchmarks 48 models, models=1000
actually benchmarks 996 models). Fix this by either changing the loop values to
numbers divisible by 6 such that when multiplied back (divideCount * 6) they
equal the intended model count, or recalculate the label to show the actual
number of models being created by computing divideCount * 6 and using that value
in the b.Run label instead of n. Apply the same fix to the second benchmark loop
also mentioned in the comment.

In `@tests/perf/README.md`:
- Around line 26-29: The README.md file contains outdated performance
documentation in the section describing the routed path (around lines 26-29).
The current text still references repeated full-catalog copies per request, but
the actual implementation now uses O(1) selector-index behavior where resolution
is computed once per request and reused. Update the routed path performance
explanation to accurately reflect this current behavior by removing or revising
the outdated statement about order of magnitude allocations from repeated
catalog copies, and instead describe how resolution is now computed once and
reused, which provides the O(1) performance characteristics referenced in the
perf-guard commentary.

---

Outside diff comments:
In `@internal/core/request_snapshot_test.go`:
- Around line 74-106: The
TestNewRequestSnapshotWithOwnedMaps_TakesOwnershipOfCapturedBytes test currently
only validates captured body ownership but does not test the complete contract
of the NewRequestSnapshotWithOwnedMaps constructor regarding map and header
semantics. Add assertions to verify that route, query, and trace maps passed to
NewRequestSnapshotWithOwnedMaps are not cloned by the snapshot (confirming
ownership is taken), while headers should still be defensively cloned to prevent
external mutations. Create sample maps for these fields, pass them through the
constructor, retrieve them via appropriate accessor methods, and verify the
pointer equality or inequality as needed to confirm the ownership vs cloning
behavior for each map type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cbf95ee6-7e26-4f2d-b8a7-0d7f47aad549

📥 Commits

Reviewing files that changed from the base of the PR and between 9025183 and 1283ab9.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (125)
  • cmd/gomodel/health.go
  • cmd/recordapi/main.go
  • go.mod
  • internal/admin/handler_guardrails.go
  • internal/admin/handler_live.go
  • internal/aliases/batch_preparer.go
  • internal/anthropicapi/request.go
  • internal/anthropicapi/response.go
  • internal/anthropicapi/stream.go
  • internal/anthropicapi/types.go
  • internal/app/app.go
  • internal/auditlog/auditlog.go
  • internal/auditlog/entry_capture.go
  • internal/auditlog/middleware.go
  • internal/auditlog/reader_postgresql.go
  • internal/auditlog/reader_sqlite.go
  • internal/batch/store.go
  • internal/cache/modelcache/local.go
  • internal/cache/modelcache/modelcache.go
  • internal/cache/modelcache/redis.go
  • internal/conversationstore/store.go
  • internal/conversationstore/store_memory.go
  • internal/core/audio.go
  • internal/core/batch.go
  • internal/core/batch_json.go
  • internal/core/batch_preparation.go
  • internal/core/chat_content.go
  • internal/core/chat_json.go
  • internal/core/conversations.go
  • internal/core/embeddings_encoding.go
  • internal/core/embeddings_json.go
  • internal/core/errors.go
  • internal/core/json_fields.go
  • internal/core/json_fields_test.go
  • internal/core/message_json.go
  • internal/core/request_snapshot.go
  • internal/core/request_snapshot_test.go
  • internal/core/responses.go
  • internal/core/responses_json.go
  • internal/core/semantic_canonical.go
  • internal/core/types.go
  • internal/core/usage_json.go
  • internal/embedding/embedding.go
  • internal/gateway/batch_usage.go
  • internal/guardrails/batch_rewrite.go
  • internal/guardrails/batch_rewrite_test.go
  • internal/guardrails/definitions.go
  • internal/guardrails/executor.go
  • internal/guardrails/responses_message_apply.go
  • internal/guardrails/store_mongodb.go
  • internal/live/broker.go
  • internal/llmclient/client.go
  • internal/modeldata/fetcher.go
  • internal/modeloverrides/batch_preparer.go
  • internal/modeloverrides/store.go
  • internal/modeloverrides/store_postgresql.go
  • internal/modeloverrides/store_sqlite.go
  • internal/pricingoverrides/store.go
  • internal/pricingoverrides/store_postgresql.go
  • internal/pricingoverrides/store_sqlite.go
  • internal/providers/anthropic/anthropic.go
  • internal/providers/anthropic/batch.go
  • internal/providers/anthropic/chat.go
  • internal/providers/anthropic/chat_stream.go
  • internal/providers/anthropic/request_translation.go
  • internal/providers/anthropic/responses.go
  • internal/providers/anthropic/types.go
  • internal/providers/bailian/bailian.go
  • internal/providers/batch_results_file_adapter.go
  • internal/providers/bedrock/chat.go
  • internal/providers/bedrock/chat_stream.go
  • internal/providers/chat_stream_normalize.go
  • internal/providers/deepseek/deepseek.go
  • internal/providers/gemini/gemini.go
  • internal/providers/gemini/native.go
  • internal/providers/gemini/native_stream.go
  • internal/providers/googlecommon/auth.go
  • internal/providers/ollama/ollama.go
  • internal/providers/openai/openai.go
  • internal/providers/registry.go
  • internal/providers/registry_metadata.go
  • internal/providers/resolve_bench_test.go
  • internal/providers/responses_adapter.go
  • internal/providers/responses_content.go
  • internal/providers/responses_converter.go
  • internal/providers/responses_input.go
  • internal/providers/responses_output.go
  • internal/providers/responses_output_state.go
  • internal/providers/router.go
  • internal/providers/vertex/vertex.go
  • internal/providers/xai/xai.go
  • internal/providers/xiaomi/audio.go
  • internal/responsecache/responsecache.go
  • internal/responsecache/semantic.go
  • internal/responsecache/simple.go
  • internal/responsecache/sse_validation.go
  • internal/responsecache/stream_cache.go
  • internal/responsecache/stream_cache_chat.go
  • internal/responsecache/stream_cache_responses.go
  • internal/responsecache/vecstore_pinecone.go
  • internal/responsecache/vecstore_qdrant.go
  • internal/responsecache/vecstore_weaviate.go
  • internal/responsestore/store.go
  • internal/server/conversation_responses.go
  • internal/server/internal_chat_completion_executor.go
  • internal/server/native_conversation_service.go
  • internal/server/native_response_service.go
  • internal/server/request_selector_peek.go
  • internal/server/request_snapshot.go
  • internal/server/response_input_items.go
  • internal/server/translated_inference_service.go
  • internal/streaming/observed_sse_stream.go
  • internal/usage/audio.go
  • internal/usage/cost.go
  • internal/usage/extractor.go
  • internal/usage/reader_postgresql.go
  • internal/usage/reader_sqlite.go
  • internal/usage/realtime.go
  • internal/usage/recalculate_pricing.go
  • internal/usage/store_sqlite.go
  • internal/workflows/store_postgresql.go
  • internal/workflows/store_sqlite.go
  • internal/workflows/types.go
  • tests/perf/README.md
  • tests/perf/hotpath_test.go

Comment thread internal/providers/registry.go Outdated
Comment thread internal/providers/resolve_bench_test.go Outdated
Comment thread tests/perf/README.md
Comment on lines +26 to +29
covers the per-request resolution path. This routed path currently allocates an
order of magnitude more per request because resolution re-copies the full model
catalog several times; its guard ceilings should tighten significantly once
resolution is computed once per request and reused.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

README performance explanation is outdated for the routed path.

Line 26-Line 29 still describe repeated full-catalog copies per request, but the routed perf-guard commentary now assumes O(1) selector-index behavior. This mismatch can mislead perf investigations.

Suggested fix
-`BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` +
-`ModelRegistry` (the production shape) with a representative catalog, so it
-covers the per-request resolution path. This routed path currently allocates an
-order of magnitude more per request because resolution re-copies the full model
-catalog several times; its guard ceilings should tighten significantly once
-resolution is computed once per request and reused.
+`BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` +
+`ModelRegistry` (the production shape) with a representative catalog, so it
+covers the per-request resolution path. With the selector-index fast path,
+routed overhead should stay close to the bare-provider case and avoid
+catalog-size-linear per-request copying for qualified-selector resolution.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
covers the per-request resolution path. This routed path currently allocates an
order of magnitude more per request because resolution re-copies the full model
catalog several times; its guard ceilings should tighten significantly once
resolution is computed once per request and reused.
`BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` +
`ModelRegistry` (the production shape) with a representative catalog, so it
covers the per-request resolution path. With the selector-index fast path,
routed overhead should stay close to the bare-provider case and avoid
catalog-size-linear per-request copying for qualified-selector resolution.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/perf/README.md` around lines 26 - 29, The README.md file contains
outdated performance documentation in the section describing the routed path
(around lines 26-29). The current text still references repeated full-catalog
copies per request, but the actual implementation now uses O(1) selector-index
behavior where resolution is computed once per request and reused. Update the
routed path performance explanation to accurately reflect this current behavior
by removing or revising the outdated statement about order of magnitude
allocations from repeated catalog copies, and instead describe how resolution is
now computed once and reused, which provides the O(1) performance
characteristics referenced in the perf-guard commentary.

SantiagoDePolonia and others added 2 commits June 18, 2026 18:52
CI (linux/amd64) and local (darwin/arm64) produce identical allocation counts
and near-identical byte counts, confirming these are deterministic. Tighten the
ceilings from "intentionally generous" to ~10% over the measured baseline so the
guard catches smaller regressions while still absorbing Go/dependency drift:

  hot_path:          125 -> 120 allocs            (baseline 113)
  routed:            160 -> 150 allocs, 18->16 KB (baseline 137 / ~14.7 KB)
  responses_stream:  310 -> 222 allocs, 25->22 KB (baseline 202 / ~19.6 KB)
  shared_observers:  unchanged (already tight, no headroom to trim)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- registry: trim publicName/ProviderType when building the qualified-selector
  index and skip empty keys, matching the trimmed lookup inputs and the previous
  catalog scan (which compared trimmed fields on both sides). Prevents the O(1)
  fast path from missing when provider metadata carries whitespace padding.
- resolve_bench_test: build exactly totalModels (round-robin across providers)
  instead of providersN*(n/6); the models=50/1000 cases previously benchmarked
  48/996 models due to integer truncation. Add benchSelector helper.
- request_snapshot_test: extend the owned-maps test to assert route/query/trace
  maps are owned (not cloned) while headers are still defensively cloned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@SantiagoDePolonia SantiagoDePolonia merged commit 2677c1f into main Jun 18, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants